Introduction

Welcome to my first project on data visualization and communication. This project is divided into 4 sub-projects, and we will begin with the 1st sub-project. In this sub-project I will use several visualization tools covered in the EPFL extension course to highlight trends and variability in a time series using daily SMI values. So let’s dive in!

Part 1: Lineplot of the whole daily SMI values

In the first part let’s see what the SMI index looks like with a line plot.

Wait, what’s this huge drop in the beginning of 2020? Let’s take a closer look at the 2019-2020 period.

Part 2: Lineplot since January 2019

Okay, that’s better. It seems that the index had an upward trend during the year 2019, only to suddenly return to the starting point in early 2020. Let’s try to draw a vertical line so that we can better compare.

Part 3: Lineplot of daily SMI values with a treshold at CHF 8900

We can now more or less estimate the time between the two thresholds. We will add a small function that allows us to know the exact day by passing the mouse over the line.

Part 4: Time interval above CHF 8900 between 2019 and 2020.

When I point to the dates indicated with the plotly function, I get January 28, 2019 and March 12, 2020. That’s nice. But what if we want to get the same information directly using R code?

Part 5: Part 4 using R code.

The last date in 2019 when the SMI was below CHF 8900 was the 2019-01-28 and the first date in 2020 was the 2020-03-12.

And how long does it all represent? I’m pretty bad at calculating the number of days, I don’t even know by heart which months have 31 days and which ones have 30. Let’s try to calculate it with R and the lubridate function.

Part 6: Number of days between the two tresholds

The value of the SMI was above CHF 8900 for 409 days.

Part 7: SMI weekly mean values since January 2019

Let’s now find out the weekly mean values of the SMI since January 2019.

Part 8: Timeplot of daily SMI values vs weekly values

Part 9: SMI monthly mean values since January 2019

Let’s do the same but this time with monthly values.

Part 10: Timeplot of daily SMI values vs monthly values

While the variability within months was more or less similar during the year 2019, we note that it increases considerably during the months of March and April 2020.

Part 11: Boxplots of daily SMI values since January 2019

Here we can visualize this variability even better. It goes without saying that the month of March 2020 observes the greatest variability, while the month of February 2020 had the highest average. This is not surprising since this is when covid-19 started to wreak havoc. Even if at present, the SMI has well recovered to the level before COVID, the health and social consequences are still very much present.

Part 12: Analysis with new data

For the final part, I get to choose my own data and do some analysis. Even if it’s the most challenging part, it’s also the most interesting. Here are the milestones to reach:

  1. Identify a question I’d like to answer.
  2. Find a dataset that will help to answer this question.
  3. Produce one or more graphs using the ggplot2 package.
  4. Answer the question by commenting on the graph obtained.

1. Identify a question

I would like to know on which date there were the most new COVID cases in Switzerland. For this I found a dataset on www.ourworldindata.org.

3. Produce one or more graphs using the ggplot2 package

The dataset is huge. Let’s focus on what interests us, the number of new cases in Switzerland.

After a first draft, I notice that there are still some problems in my data. First of all there are a number of “NAs”. Then, as new cases are not counted on weekends, the number drops to zero. Therefore it would be better to count the average result over 7 days. Let’s tidy the data a bit more. Here you can find the table with the average result.

date total_cases new_cases cases_7_mean
2020-02-25 1 1 NA
2020-02-26 1 0 NA
2020-02-27 8 7 NA
2020-02-28 8 0 NA
2020-02-29 18 10 NA
2020-03-01 27 9 NA
2020-03-02 42 15 6.000000
2020-03-03 56 14 7.857143
2020-03-04 90 34 12.714286
2020-03-05 114 24 15.142857

According to the graph, it seems that the date with the most new cases was the 2020-11-02 with 21926 cases. The highest 7 day average was the 2020-11-06 with 8237 cases on average.

It is also interesting to note that the first Covid cases took place in March 2020, which reinforces our hypothesis for the fall of the SMI during this period in part 11 of this project.

Just for the fun and since we have the data directly available, let’s plot the cumulative cases of COVID 19 in Switzerland.